leader strategy
- Asia > China (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Leisure & Entertainment > Games (0.97)
- Government (0.68)
- Asia > China (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design
Brero, Gianluca, Eden, Alon, Chakrabarti, Darshan, Gerstgrasser, Matthias, Greenwald, Amy, Li, Vincent, Parkes, David C.
We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader's learning environment, providing a formulation of the leader's learning problem as a POMDP that we call the Stackelberg POMDP. We prove that the optimal leader's strategy in the Stackelberg game is the optimal policy in our Stackelberg POMDP under a limited set of possible policies, establishing a connection between solving POMDPs and Stackelberg games. We solve our POMDP under a limited set of policy options via the centralized training with decentralized execution framework. For the specific case of followers that are modeled as no-regret learners, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited communication by agents. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to the case of correlated types.
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Education (0.68)
- Leisure & Entertainment > Games (0.67)
- Government (0.46)
Non-myopic learning in repeated stochastic games
In repeated stochastic games (RSGs), an agent must quickly adapt to the behavior of previously unknown associates, who may themselves be learning. This machine-learning problem is particularly challenging due, in part, to the presence of multiple (even infinite) equilibria and inherently large strategy spaces. In this paper, we introduce a method to reduce the strategy space of two-player general-sum RSGs to a handful of expert strategies. This process, called Mega, effectually reduces an RSG to a bandit problem. We show that the resulting strategy space preserves several important properties of the original RSG, thus enabling a learner to produce robust strategies within a reasonably small number of interactions. To better establish strengths and weaknesses of this approach, we empirically evaluate the resulting learning system against other algorithms in three different RSGs.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Texas (0.04)
- (2 more...)
- Leisure & Entertainment > Games (0.68)
- Energy > Power Industry (0.48)
Robust Learning for Repeated Stochastic Games via Meta-Gaming
Crandall, Jacob W. (Masdar Institute of Science and Technology)
In repeated stochastic games (RSGs), an agent must quickly adapt to the behavior of previously unknown associates, who may themselves be learning. This machine-learning problem is particularly challenging due, in part, to the presence of multiple (even infinite) equilibria and inherently large strategy spaces. In this paper, we introduce a method to reduce the strategy space of two-player general-sum RSGs to a handful of expert strategies. This process, called mega, effectually reduces an RSG to a bandit problem. We show that the resulting strategy space preserves several important properties of the original RSG, thus enabling a learner to produce robust strategies within a reasonably small number of interactions. To better establish strengths and weaknesses of this approach, we empirically evaluate the resulting learning system against other algorithms in three different RSGs.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Texas (0.04)
- (2 more...)
- Energy (0.96)
- Leisure & Entertainment > Games (0.93)
Towards Optimal Patrol Strategies for Fare Inspection in Transit Systems
Jiang, Albert Xin (University of Southern California) | Yin, Zhengyu (University of Southern California) | Johnson, Matthew P. (University of Southern California) | Tambe, Milind ( University of Southern California ) | Kiekintveld, Christopher (University of Texas at El Paso) | Leyton-Brown, Kevin (University of British Columbia) | Sandholm, Tuomas (Carnegie Mellon University)
In some urban transit systems, passengers are legally required to purchase tickets before entering but are not physically forced to do so. Instead, patrol units move about through the transit system, inspecting tickets of passengers, who face fines for fare evasion. This setting yields the problem of computing optimal patrol strategies satisfying certain temporal and spacial constraints, to deter fare evasion and hence maximize revenue. In this paper we propose an initial model of this problem as a leader-follower Stackelberg game. We then formulate an LP relaxation of this problem and present initial experimental results using real-world ridership data from the Los Angeles Metro Rail system.
- North America > United States > California > Los Angeles County > Los Angeles (0.50)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > Texas > El Paso County > El Paso (0.04)
- (2 more...)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Rail (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)